Facets

PH345: Winter 2025

Phil Boonstra

Horse in Motion

Is there a moment where a galloping horse has all hooves off the ground?

Eadweard Muybridge, 1878. Public Domain

Chromosomes

Schematic representation of late-prophase chromosomes (1000-band stage) of man, chimpanzee, gorilla, and orangutan, arranged from left to right, respectively, to better visualize homology between the chromosomes of the great apes and the human complement.

Figure 2, Yunis and Prakash (1982)

What is this called?

[S]lice the data into parts according to one or more data dimensions, visualize each data slice separately, and then arrange the individual visualizations into a grid (Ch21, Wilke, 2019).

Called ‘small multiples’ (Tufte, 1991), ‘trellis plots’ (Becker et al., 1996), or ‘facet plots’ (Wickham, 2016).

Why do this?

Introduces a third dimension (first two dimensions being x and y). Essentially another aesthetic.

Tufte’s principles of graphical excellence:

  • Present many numbers in small space

  • Make large data sets coherent

What are guidelines?

Use when you want to focus audience attention on how differences

Faceting variable must be categorical

Each mini-plot should (normally) have same structure: common axes, scales, etc

Hadley Wickham (1979-)

New Zealand statistician, Chief Scientist at Posit PBC

Creator of ggplot2 and the tidyverse

John Chambers Award for Statistical Computing (2006); Fellow of ASA (2015); COPSS Presidents’ Award (2019)

Two flavors of faceting in ggplot2

  1. facet_wrap(): wrap a 1D ribbon of plots into 2D grid
  2. facet_grid(): create a 2D grid of plots

See Chapter 16 of ggplot2 book (https://ggplot2-book.org/facet)

We’ve already seen facet_wrap()

# Install and load datasauRus if not already installed
if(!require(datasauRus)) {install.packages("datasauRus");library(datasauRus)}
ggplot(datasaurus_dozen) +
  geom_point(aes(x = x, y = y), size = 1) + 
  facet_wrap(facets = vars(dataset), ncol = 5) +
  guides(color = "none") +
  theme(text = element_text(size = 20)) 

facet_grid() doesn’t ‘work’ here

# Install and load datasauRus if not already installed
if(!require(datasauRus)) {install.packages("datasauRus");library(datasauRus)}
ggplot(datasaurus_dozen) +
  geom_point(aes(x = x, y = y), size = 1) + 
  facet_grid(rows = vars(dataset)) +
  guides(color = "none") +
  theme(text = element_text(size = 20)) 

Doesn’t work as columns either

# Install and load datasauRus if not already installed
ggplot(datasaurus_dozen) +
  geom_point(aes(x = x, y = y), size = 1) + 
  facet_grid(cols = vars(dataset)) +
  guides(color = "none") +
  theme(text = element_text(size = 20)) 

Malaria

  • Spread by the bite of female Anopheles mosquitos, which are hosts to the malaria parasite

  • Exists in tropical and subtropical regions with inadequate public health infrastructure

  • 263 million cases in 2023; 597,000 deaths

US CDC: https://www.cdc.gov/malaria/data-research/index.html;

World Malaria Report, WHO: https://www.who.int/teams/global-malaria-programme/reports/world-malaria-report-2024

Global malaria incidence

WHO collects country-level data on malaria incidence.

Raw data on incidence per 1000-at risk persons available at https://data.worldbank.org/indicator/SH.MLR.INCD.P3

Friendlier data available on canvas (malaria_countries_long.csv)

An unfaceted plot of incidence: Sub-Saharan Africa

# I've downloaded the data from Canvas into my project folder:
malaria_countries <- read_csv("malaria_countries_long.csv")
malaria_countries_ssafrica <- malaria_countries %>% filter(region == "Sub-Saharan Africa")
ggplot(malaria_countries_ssafrica) +
  geom_line(aes(x = year, y = incidence, group = country, color = country), 
            alpha = 0.75)

An faceted plot of incidence: Sub-Saharan Africa

# Create ggplot object for future updating
p1 <-
  ggplot(data = malaria_countries_ssafrica,
         mapping = aes(x = year, y = incidence)) +
    geom_line() + 
    facet_wrap(vars(country)) +
    scale_y_continuous(name = "Incidence per 1000 at-risk persons")

p1

Drop the ‘strips’

# To drop the strips completely, add this to your code:
theme(strip.text = element_blank())
# Update p1 object from previous slide with the provided hints
p2 <- 
  p1 + theme(strip.text = element_blank())
p2

Add labels inside

# To add labels to the plots, use a dataset with just one row per country:
malaria_countries_ssafrica %>%
  group_by(country) %>%
  slice(1)
# Update p1 object from previous slide with the provided hints
p3 <- 
  p2 +
    # The '.' means we are taking the data inherited from the p1 object
    # and further modifying it
    geom_text(data = . %>% group_by(country) %>% slice(1),
              aes(label = country), x = 2000, y = 650, hjust = 0, size = 3)+
    theme(strip.text = element_blank())
p3

Arrange countries by the change in incidence from 2000 to 2022

# Create a new variable called delta_incidence that is the difference 
# between the last and first incidence values for each country. 
# Then arrange the data by this variable and turn country into a factor
# with levels in the order that they appear in the data:

mutate(country = factor(country) %>% fct_inorder())
malaria_countries_ssafrica2 <- 
  malaria_countries_ssafrica %>% 
  group_by(country) %>%
  # Note we are using mutate not summarize since we want to keep the year-level
  # data. Because the data are grouped by `Country Code`, the result will just
  # be constant within country
  mutate(delta_incidence = last(incidence) - first(incidence)) %>% 
  ungroup() %>% 
  # Now sort by the calculated value, then country name, then year
  arrange(delta_incidence, country, year) %>% 
  # Now we recharacterize Country name as a factor with levels as they
  # are arranged
  mutate(country = factor(country) %>% fct_inorder())

# update p3 object from previous slide with new data
# Note use of new operator `%+%` for updating data. Type ?'%+%' for more infor
p4 <- p3 %+% malaria_countries_ssafrica2
p4

Anchor all panels with the Sub-Saharan-wide trajectory

# I've downloaded the data from Canvas into my project folder:
malaria_all_ssafrica <- read_csv("Malaria/malaria_other_long.csv") %>% filter(name == "Sub-Saharan Africa")
p4 + 
  geom_line(data = malaria_all_ssafrica,
            linetype = "dashed", alpha = 0.75) +
  scale_x_continuous(breaks = c(2000, 2010, 2020))

An example with varying y-axes

ggplot(data = malaria_countries_ssafrica,
       mapping = aes(x = year, y = incidence)) +
  geom_line() + 
  facet_wrap(vars(country), scales = "free_y") +
  scale_y_continuous(name = "Incidence per 1000 at-risk persons")

Think about differences between choice of varying y-axes: Varying y-axes are essentially many individual plots. Varying y-axes make it easier to evaluate differences within a panel but harder to compare between panels

Unemployment in USA

Jorge Camoes, https://excelcharts.com/charts-monthly-unemployment-rates-by-state-1976-2009/

Wrapped facet by US States, ordered by size of workforce. Easy to compare trends across states. Unemployment band provides anchors

Life expectancy in USA

Anna Maria Barry-Jester, https://fivethirtyeight.com/features/as-u-s-life-expectancies-climb-people-in-a-few-places-are-dying-younger/

Sort of a facet grid by categorized latitude and longtitude. The trend line provides an anchor. The amount of purple and orange allow to immediately determine

Code Together Task

No Spice: Make the standard faceted plot on slide 15;

Weak Sauce: Make the faceted plot without strips on slide 16; Make the faceted plot with varying y-axes on slide 20;

Medium Spice: Make the faceted plot with inside labels on slide 17

Yoga Flame: Make the faceted plot with reordered facets on slide 18

Dim Mak: Make the faceted plot with the Sub-Saharan Africa trajectory on slide 19

References

Becker, R.A., Cleveland, W.S. and Shyu, M.J., 1996. The visual design and control of trellis display. Journal of computational and Graphical Statistics, 5(2), pp.123-155.

Tufte, E.R., 1991. Envisioning information. Optometry and Vision Science, 68(4), pp.322-324.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

Wilke, C.O., 2019. Fundamentals of data visualization: a primer on making informative and compelling figures. O’Reilly Media.

Yunis, J.J. and Prakash, O., 1982. The origin of man: a chromosomal pictorial legacy. Science, 215(4539), pp.1525-1530.